UNCERTAINTY                                                                                             129

be reserved for upper-grade secondary courses in probability and statis-
tics, and no attempt should be made to present more than a few specific
procedures. Particularly in the case of significance tests, a formal ap-
proach obscures the reasoning to such an extent that it may be better to
avoid hypotheses and test statistics altogether.

Confidence Intervals

The reasoning behind confidence statements is relatively straightfor-
ward. What is more, news reports of opinion polls and their margins
of error provide a steady supply of examples for discussion. How is it
that a sample of only 1500 people can accurately represent the opinion
of 185 million American adults? Random sampling provides a part of
the explanation; sampling distributions provide the rest, and confidence
intervals explain what the margin of error means.

Confidence statements can be introduced after students have some
experience with simulation of sampling distributions. The distinction
between population and sample, the idea of random sampling, and the
notion of a sampling distribution are fundamental to inference. Sim-
ulation allows the gradual introduction of confidence intervals during
the exploration of sampling and sampling distributions. The ideas of
confidence intervals can be taught via graphical display of simulated
samples.10 A more formal approach requires familiarity with normal
distributions.

Suppose that in a large county 30% of high school students drive cars
to school. Asking a simple random sample of 250 students whether they
drove to school today produces 250 independent observations, each with
probability 0.3 of being "yes." The proportion p of "yes" responses in
the sample varies from sample to sample. Simulate (say) 1000 samples
and display the sampling distribution of p. It is approximately normal,
with mean 0.3 and standard deviation 0.029. Repeated simulations
of samples of various sizes from this population demonstrate that the
center of the sampling distribution remains at 0.3 and that the spread
is controlled by the size of the sample. In large samples (about 1000 or
so) the values of the sample statistic p are tightly concentrated around
the population parameter p = 0.3. Students can see empirically that
samples of this size allow good guesses about the entire population.

But just how good are guesses based on a sample? We can quantify the
answer by describing how the statistic p varies in repeated sampling. It is
a basic fact of normal distributions that about 95% of all observations
lie within two standard deviations on either side of the mean. So in
repeated sampling, 95% of all samples of 250 students give a sample